172 research outputs found
Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation
Transformers have shown great success in medical image segmentation. However,
transformers may exhibit a limited generalization ability due to the underlying
single-scale self-attention (SA) mechanism. In this paper, we address this
issue by introducing a Multi-scale hiERarchical vIsion Transformer (MERIT)
backbone network, which improves the generalizability of the model by computing
SA at multiple scales. We also incorporate an attention-based decoder, namely
Cascaded Attention Decoding (CASCADE), for further refinement of multi-stage
features generated by MERIT. Finally, we introduce an effective multi-stage
feature mixing loss aggregation (MUTATION) method for better model training via
implicit ensembling. Our experiments on two widely used medical image
segmentation benchmarks (i.e., Synapse Multi-organ, ACDC) demonstrate the
superior performance of MERIT over state-of-the-art methods. Our MERIT
architecture and MUTATION loss aggregation can be used with downstream medical
image and semantic segmentation tasks.Comment: 19 pages, 4 figures, MIDL 202
G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation
In recent years, medical image segmentation has become an important
application in the field of computer-aided diagnosis. In this paper, we are the
first to propose a new graph convolution-based decoder namely, Cascaded Graph
Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation.
G-CASCADE progressively refines multi-stage feature maps generated by
hierarchical transformer encoders with an efficient graph convolution block.
The encoder utilizes the self-attention mechanism to capture long-range
dependencies, while the decoder refines the feature maps preserving long-range
information due to the global receptive fields of the graph convolution block.
Rigorous evaluations of our decoder with multiple transformer encoders on five
medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp
lesions, Skin lesions, and Retinal vessels) show that our model outperforms
other state-of-the-art (SOTA) methods. We also demonstrate that our decoder
achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer
parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other
hierarchical encoders for general-purpose semantic and medical image
segmentation tasks.Comment: 13 pages, IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV 2024
- …